Selecting Labels for News Document Clusters
نویسندگان
چکیده
This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
منابع مشابه
Quantifying WiMAX Performance
From a research point of view, the task of text clustering presents a great challenge, especially in a multilingual context. While a number of document-clustering techniques exist, they all lack the fundamental ability to provide sensible descriptions (labels) of the output document groups. This has been the primary focus of the Carrot2 project – to extract sensible groups of documents on relat...
متن کاملSemi-Supervised Events Clustering in News Retrieval
The presentation of news articles to meet research needs has traditionally been a document-centric process. Yet users often want to monitor developing news stories based on an event, rather than by examining an exhaustive list of retrieved documents. In this work, we illustrate a news retrieval system, eventNews, and an underlying algorithm which is event-centric. Through this system, news arti...
متن کاملCorrelated Concept based Topic Updation Model for Dynamic Corpora
A rapid growth of documents available on the Internet, digital libraries, medical documents, news wires and other scientific document corpuses has motivated the researchers to propose many text mining techniques that help users to quickly retrieve trace and summarize the information in an effective way. Topic detection is one such technique which discovers precise, meaningful and concise labels...
متن کاملExtending k - means with the description comes first approach
This paper describes a technique for clustering large collections of short and medium length text documents such as press articles, news stories and the like. The technique called description comes first (DCF) consists of identification of related document clusters, selection of salient phrases relevant to these clusters and reallocation of documents matching the selected phrases to form final ...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کامل